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4. ^processor device comprising „. AA 

Technology Center 2100 

an instrucHpn stream transformation unit that transforms code blocks of instructions from an 
original instraction set architecture to a transformed instruction set architecture, 

a regular cache tft^t stores instructions in said original instruction set architecture, 

an instruction streJgn cache that stores instructions in said transformed instruction set 
architecture, and 

an execute unit for executing instructions, 

wherein said instruction stream transformation unit transforms said code blocks of instructions 
from said original instruction sk architecture to said transformed instruction set architecture, 

wherein said instruction stream Vache stores said code blocks after transformation to said 
transformed instruction set architecture for possible future execution, 

wherein said instruction stream caches addressed by some of said fetch requests from said 
execute unit and can potentially respond to some of said fetch requests for said code blocks after 
transformation without requiring cache hit information from said regular cache after said code 
blocks have already been transformed and stored into the instruction stream cache, and 

whereby the execution of a program code by saiotorocessor device is accelerated by transforming 
portions of said program code at run-time into th\ transformed instruction set architecture for 
more efficient execution and caching the transformed code within the instruction stream cache 
for possible repeated execution without requiring repeated transformations. 
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45. A processor device as in claim (44 wherein 

said execute unit can directly execute instructions in both said original instruction set architecture 
and said transformed instruction sot architecture, 

whereby the execute unit can exicute code in said original instruction set architecture without 
having to first wait for said cpde to be transformed into said transformed instruction set 
architecture. 

46. A processor device as in cltim 45 wherein said execute unit is an in-order execute unit that 
retires instructions and commitp their results in the same order as the instructions occur in the 
code. 

47. A processor device as iii claim 45 wherein said execute unit is a dynamic out-of-order 
execute unit that can retire instructions out of order compared to the order of the instructions in 
the code. 

48. A processor device as irl claim 44 further comprising a working memory connected to said 
instruction stream transformation unit for storing intermediate calculations during the process of 



transforming code from saic 
set architecture. 

49. A processor device a> 
transforms blocks of code \* 



50. A processor device as 
transforms blocks of cod 
architecture code. 

51. A processor device as 



original instruction set architecture into said transformed instruction 

in claim 44 wherein said instruction stream transformation unit 
lich are presumed hyper-blocks. 

in claim 44 wherein said instruction stream transformation unit 
which are denoted as hyper-blocks in original instruction set 



52. A processor device as m 
using a hyper-block ID as ai i 



claim 44 wherein said instruction stream cache comprises means of 



using a tag for each cache li ne to denote whether the tag is a start of a hyper-block. 



claim 44 wherein said instruction stream cache comprises means of 
alternative way of addressing a transformed block of code. 
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53. A processor device as in claim 52 wherein said instruction stream cache comprises means of 
storing a plurality of hyper-block lines that/ are chained together using a common hyper-block ID 
plus pointers. 

54. A processor device as in claim 44 Wherein said instruction stream cache only hits when a 
transformed block of code is entered starting from the first instruction of the block. 

55. A processor device as in claim 4fl wherein said instruction stream transformation unit 
comprises means of instruction re-orderimg. 

56. A processor device as in claim 44 wherein said instruction stream transformation unit 
comprises means of performing predication if-conversion to convert an if-then-else construct in 
said code into a predication calculation and a series of predicated instructions that conditionally 
commit their results depending on the results of said predication calculation. 

57. A processor device as in claim/ 44 wherein said instruction stream transformation unit 



comprises means of converting a loa< 
instruction pair, 



instruction into a speculative load and a load activation 



whereby the possibility of more efficient scheduling of the transformed code is enabled by 
allowing the speculative load to be scheduled earlier than a normal load could be scheduled 
without this conversion thus minimizing possible waiting during execution for this memory load 
to complete. 

58. A processor device as in claifn 44 wherein said instruction stream transformation unit 
comprises a parallel dependency detector circuit for detecting possible dependencies between 
instructions, 

whereby said instruction stream transformation unit can efficiently detect said possible 
dependencies during the process of instruction stream transformation. 

59. A processor device as in claim 44 wherein said instruction stream transformation unit 
transforms said code blocks by div ding long instruction sequences into sequences of a defined 
maximum number of instructions called an instruction window that is equal to a number of 
instructions that said instruction str< :am transformation unit can work with effectively. 
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60. A processor device as in claim 59 wherein said instruction stream transformation unit 
transforms said code blocks by dividing/long instruction sequences into overlapping sequences of 
a defined maximum number of instructions called overlapping instruction windows, 

whereby this enables the efficient scheduling of code both within the middle of instruction 
windows and within the overlap regions between instruction windows. 

61. A processor device as in claim 44 wherein said instruction stream transformation unit 
comprises means of creating a dependency matrix to represent potential dependencies between 
instructions, 

whereby the dependency matrix pitvides dependency information in an efficiently accessed 
manner for the instruction stream transformation unit. 



61 wherein said instruction stream transformation unit 
mapping table to represent potential writes of operands 



62. A processor device as in cl 
comprises means of creating an operand 
by instructions, 

whereby said operand mapping tattle enables dependencies between instructions to be detected 
efficiently and for the dependency matrix to be built efficiently by said instruction stream 
transformation unit. 

63. A processor device as in cjaim 62 wherein said instruction stream transformation unit 
comprises means of performing register renaming to rename uses of registers in the transformed 
code blocks when necessary to minimize write-after-write hazards between write instructions, 

whereby scheduling dependencies caused by said write-after-write hazards is reduced. 

64. A processor device as in cl^m 63 wherein said register renaming by the instruction stream 
transformation unit allocates a set of physical registers that are separate from a second set of 
registers allocated by dynamic re jister renaming done by the execute unit, 

whereby said instruction stream transformation unit and said execute unit are each able to 
independently allocate physical registers for register renaming without conflicting with each 
other. 
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65. A processor device as in claim 44 wherein the instruction stream transformation unit 
comprises means of performing an instruction scheduling algorithm to schedule a code block. 

66. A processor device as in clainj 65 wherein said scheduling algorithm is a list scheduling 
algorithm comprising the steps of 
doing a basic forward iterative traversal to calculate the minimum cycle number of each 
instruction in said block from the sflart of said block, 

propagating the depth of each /leaf instruction to all its predecessors to calculate each 
instruction's priority as defined b>|the depth of its deepest child, and 

performing a second forward iterative traversal to schedule instructions for execution cycle-by- 
cycle, where the scheduling priorities are used in conjunction with a ready list of instructions that 
are ready to be scheduled because all dependencies have been resolved. 

67. A processor device as in claim 44 wherein 

said execute unit comprises meaps of performing dynamic memory disambiguation, 

said transformed instruction set/architecture supports notations for indicating ambiguous memory 
operations, 

and said instruction stream transformation unit calculates said notations for indicating ambiguous 
memory operations. 

68. A processor device as jn claim 67 wherein said instruction stream transformation unit 
comprises means of creating an ambiguous memory dependency matrix. 

69. A processor device as an claim 67 wherein said instruction stream transformation unit 
comprises means of converting an ambiguous memory read instruction into a speculative read 
and a read check instruction pair. 

70. A processor device as in claim 44 comprising a dynamically-scheduled execute unit. 
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71 . A processor device as in claim - 

wherein the transformed instruction set architecture comprises dependency notation means of 
explicitly describing dependencies between instructions, and 

wherein the instruction stream transformation unit calculates dependency notations for said code 



blocks during the process of trans 
architecture, 



brming said code blocks into said transformed instruction set 



whereby the transformed code b|ocks can be executed without having to redetect all static 
dependencies. 

72. A processor device as in claim 71 wherein said means of explicitly describing dependency 



notation means allows instructions 



to be grouped into mini-tuples for dependency notations. 



73. A processor device as in claim 71 wherein said dependencies are represented by dependency 



pointers in the transformed instruc 



ion set architecture. 



74. A processor device as in clair l 71 wherein said dependencies are represented by dependency 
vectors in the transformed instruct ion set architecture. 

75. A processor device as in claim 44 further comprising means to perform semi-dynamic 
instruction code re-writing and re- scheduling, 

whereby the code can be further optimized based on run-time information compared to code that 
is transformed only once through the instruction stream transformation unit. 

44 further comprising at least one run-time history table. 

77. A processor device as in claihi 76 comprising a value prediction history table to record the 
history of success or failure of previous executions of value predictions, 

whereby run-time behavior about value predictions can be recorded in order to help optimize 
subsequent instruction scheduling 



76. A processor device as in claim 
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78. A processor device comprising a predicate history table to record the history of previuos 
executions of predicate calculation instructions, 

whereby run-time behavior about predicates can be recorded in order to help optimize subsequent 
instruction scheduling. / 

79. A processor device comprising 
a data cache and / 

a data hit/miss history table to record the history of whether previous executions of memory 
access instructions were hits or misses in said data cache, 

whereby run-time behavior about data hit/misses can be recorded in order to help optimize 
subsequent instruction scheduling. 

80. A processor device comprising an ambiguous memory conflict history table to record the 
history of whdfher previous executions of promoted ambiguous read instructions cause memory 
conflicts or not, 

whereby run/time behavior about, ambiguous memory conflicts can be recorded in order to help 
optimize subsequent instruction scheduling. 

81. A method of providing precise interrupts in a processor implementing instruction stream 
transformation comprising the steps of 

mapping in original instruction set architecture instruction to an equivalent group of one or more 
transformed instruction set architecture instruction(s), 

using physical registers that have not been committed to logical registers to hold results, and 

allowiraj the final instruction in said group to commit the physical register result(s) to logical 
registet(s). 
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82. A method of providing precise interrupts in a processor implementing instruction stream 
transformation comprising the steps of 

assigning an instruction sequence number to each original instruction set architecture instruction 
starting from the beginning of the code block, 

marking each transformed instruction set architecture instruction with the corresponding 
instruction sequence number, and 

committing the results raf instructions in order of the instruction sequence numbers. 

83. A software method of performing code scheduling comprising the step of building a 
dependency matrix td represent potential dependencies between instructions, 

whereby the dependency matrix provides dependency information in an efficiently accessed 
manner for said software method to perform the code scheduling. 

84. A processor device comprising 

means of executing instructions from an instruction set architecture 

wherein the instruction set architecture can explicitly note dependencies between instructions by 
using dependency vectors. 

85 . A processor pevice comprising 

means of executing instructions from an instruction set architecture 

wherein the instr iction set architecture can explicitly note dependencies between instructions by 
using dependenc; / pointers. y 
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86. A processor device comprising 

means of using software rou/ines to transform code blocks from an original instruction set 

/ v 
architecture into a transformed instruction set architecture and 



an instruction stream cache pr storing transformed code blocks, 

transform said code blocks of instructions from said original 
transformed instruction set architecture, and 
cache stores said code blocks after transformation to said 
chitecture for possible future execution, 

whereby the execution cff a program code by said processor device is accelerated by transforming 
portions of said progr^n code at run-time into the transformed instruction set architecture for 
more efficient execution and caching the transformed code within the instruction stream cache 
for possible repeated execution without requiring repeated transformations. 



wherein said softv^ 
instruction set an 
wherein said ind 
transformed instructi 
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